A recent method for causal discovery is in many cases able to infer whether Xcauses Y or Y causes X for just two observed variables X and Y. It is based onthe observation that there exist (non-Gaussian) joint distributions P(X,Y) forwhich Y may be written as a function of X up to an additive noise term that isindependent of X and no such model exists from Y to X. Whenever this is thecase, one prefers the causal model X--> Y. Here we justify this method by showing that the causal hypothesis Y--> X isunlikely because it requires a specific tuning between P(Y) and P(X|Y) togenerate a distribution that admits an additive noise model from X to Y. Toquantify the amount of tuning required we derive lower bounds on thealgorithmic information shared by P(Y) and P(X|Y). This way, our justificationis consistent with recent approaches for using algorithmic information theoryfor causal reasoning. We extend this principle to the case where P(X,Y) almostadmits an additive noise model. Our results suggest that the above conclusion is more reliable if thecomplexity of P(Y) is high.
展开▼